Numerical Solutions and Applications of the SIR Model

Allegro Sostenuto
Table of Contents

1. Mathematical Model

(source: https://www.immchallenge.org.au/supporting-resources/what-is-mathematical-modelling)

1.1 Overview

(source: https://www.vcyang.com/model_flowchart/)
A mathematical model is a way to represent and describe real-world systems and phenomena using mathematical structures and relationships. It provides a concrete set of equations, formulae, or algorithms to describe the properties or dynamics of a system. It offers a simplified abstraction of complex situations in reality while retaining accuracy, often reducing the difficulty of understanding and analysis. It enables the interpolation and extrapolation from a discrete set of given data and can offer predictions of the system behavior under various unknown conditions.

1.2 Significance: Modelling Infectious Diseases

(source: https://everyone.plos.org/2020/02/20/mathematical-disease-dynamics/)
The rise of mathematical modeling of infectious diseases dates back to the 18th century when Daniel Bernoulli crafted a formula for the endemic prevalence of susceptibles for smallpox. Modern mathematical models can predict the trajectory and spread of infectious diseases, providing information on the likely outcome of an outbreak to assist public health policymakers in implementing measures to control and prevent epidemics.

1.3 SIR Model

(source: https://community.wolfram.com/groups/-/m/t/1888335)

1.3.1 Overview and Setup

The SIR model is a basic model that compartmentalizes the population into three categories: susceptible, infectious, and recovered, denoted by S, I, and R, respectively.
Let us denote:
proportion of susceptibles at time t;
proportion of infectives at time t;
proportion of recovered/removed at time t;
the rate of infection when contacting an infective;
the rate of recovery of infected group.

1.3.2 Equations and Biological Meaning

We model the changes of the aforementioned functions as follows:
The term is the proportion of susceptible individuals at a given time who become infected. The change in the proportion of susceptible individuals is therefore the negative of that term since it is constantly losing that proportion to infection.
Since the only infectives can recover from the disease, the total proportion of the recovered individuals can only depend on how many of infectives have undergone recovery, and the change of the proportion of the recovered individuals at a given time is only dependent on the proportion of the infective individuals and the rate of recovery γ.
One of the major assumptions of the SIR model is that the total population N is constant, namely. This will only be true if at all times, there are no individuals leaving or joining the total population. Practically speaking, this only applies when there is no net change in the size of the total population.
The basic reproduction rate is defined as , and it represents the predicted number of people that will become infected due to the presence of a single infective, assuming the entire population except that already-infected individual is susceptible to the disease. A greater than 1 means that all infective individuals have the capability of causing more than 1 infection, ultimately resulting a wider spread of the disease.
In a research published in late July, 2020, the basic reproduction rate of COVID is calculated to be around 4.22, which is significantly greater than 1. (source: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7385940/)

2. Simulation: a Simple, Hypothetical Situation

2.1 Scenario Description

Consider the following situation: people in a small town (total population: 10000) are experiencing the flu season, and today's record shows that 10 people have already recovered, and 200 people are having the flu. Assume that active immunity is obtained once fully recovered, and the newborn and death can be neglected during a short period of time. Here we also assume that the rate of infection when contacting a patient is 75%, and every day a patient has a probability of 15% of becoming fully recovered (rate of recovery).
Our SIR Model will be as follows:
with initial conditions as follows:

2.2 Simulations

Case 1: Base Case (β = 75%, γ = 15%)

S0 = 9790;
I0 = 200;
R0 = 10;
beta = 0.75;
gamma = 0.15;
t_max = 45;
 
N = S0 + I0 + R0;
initial = [S0, I0, R0]./N;
t_span = 0:0.5:t_max;
 
[t, y_abs] = ode45(@(t,initial)simple_SIR(t, initial, beta, gamma), t_span, initial);
 
plot(t, y_abs)
xlabel('time (days)')
ylabel('proportion')
legend('S', 'I', 'R')
title('SIR Model (β = 75%, γ = 15%)')
In the next 45 days, we expect the flu to spread through almost the entire population. The proportion of infectives will surpass the proportion of susceptibles at around day 7, peaking at around day 10. After around day 11, there will be less than 10 percent of the population who hasn't contracted the flu. The basic reproduction number here is .

Case 2: Reduced Infection Rate (β = 10%, γ = 15%)

If the infection rate is only 10%, the basic reproduction number would be , a value less than 1. Below is the simulation:
beta = 0.10;
[t, y_abs] = ode45(@(t,initial)simple_SIR(t, initial, beta, gamma), t_span, initial);
 
plot(t, y_abs)
xlabel('time (days)')
ylabel('proportion')
legend('S', 'I', 'R')
title('SIR Model (β = 10%, γ = 15%)')
As we made evident with the graph, the simulation predicted that the flu has minimal impact on the community due to the reduced infection rate, with only around 5 percent of the population ever contracting the disease.

Case 3: Reduced Infection Rate and Increased Recovery Rate (β = 40%, γ = 25%)

Suppose we can reduce the infection rate to 40% by keeping social distance, and increase the recovery rate to 25% by using stronger medication, our simulation predicts that the flu will have less of an impact than in Case 1:
beta = 0.40;
gamma = 0.25;
[t, y_abs] = ode45(@(t,initial)simple_SIR(t, initial, beta, gamma), t_span, initial);
 
plot(t, y_abs)
xlabel('time (days)')
ylabel('proportion')
legend('S', 'I', 'R')
title('SIR Model (β = 40%, γ = 25%)')
In the next 45 days, we expect the flu to spread across under 65% of the entire population. Due to social distancing, the rate of infection is drastically reduced. As a result of stronger medication, the infective population recovers faster as well. The proportion of infectives will never surpass the healthy one (susceptible + recovered), merely peaking at around 10%. The basic reproduction number here is .

3. Improving the Previous Simulation: Incorporating Randomness

3.1 Objective

In the earlier scenario, we assumed that the infection and recovery rates were constant, which is not entirely realistic. For example, the rate of infection is influenced by various factors such as mask usage, proximity, the health condition of individuals, and more. Instead of introducing new variables to account for these uncertainties, we incorporate randomness into our model to capture these fluctuating elements. Here, we consider the infection rate as a random variable ranging from 30% to 90%, and the recovery rate as a random variable spanning from 10% to 30%. Both rates follow a uniform distribution within their respective ranges.

3.2 Coding

beta_range = [0.3, 0.9];
gamma_range = [0.1, 0.3];
 
figure;
hold on
color = {[0, 0.4470, 0.7410], [0.8500, 0.3250, 0.0980], [0.9290, 0.6940, 0.1250]};
 
for i = 1:20
[t, y_abs] = ode45(@(t,initial)simple_SIR_2(t, initial, beta_range, gamma_range), t_span, initial);
for j = 1:3
plot(t, y_abs(:, j), 'Color', color{j}, 'LineWidth', 0.01)
end
end
 
hold off
title("SIR Model (β ∈ [30%, 90%], γ ∈ [10%, 30%])")
xlabel('time (days)')
ylabel('prportion')
legend('S', 'I', 'R')
Though we incorporated randomness, the trends are still discernible and relatively evident: In the next 45 days, we expect the flu to spread through almost the entire population. At around day 7, the proportion of infectives will peak and surpass the proportion of susceptibles. At around day 25, the proportion of infectives will drop below the proportion of susceptibles, after which both remain under 10 percent. The mean of the basic reproduction number here can be computed as follows:
Let β be drawn from X, the uniform random distribution from ; let be drawn from Y, the inverse distribution from .
We compute the following expected values:
Assuming β and γ are independent, then the expected value for basic reproduction rate is

4. Case Study: Hong Kong Flu

(source: https://phil.cdc.gov/Details.aspx?pid=15874)

4.1 Background

The Hong Kong flu is amongst the deadliest pandemics in history. Originating from Asia, it quickly spread worldwide and affected New York City. We attempt to gain more understanding of this outbreak by analyzing the historical data of weekly totals of excess pneumonia-influenza deaths, which can be understood as the deaths caused by the flu epidemic.

4.2 Coding

4.2.1 Visualizing Given Data

DAT_deaths_week = [14, 28, 50, 66, 156, 190, 156, 108, 68, 77, 33, 65, 24];
data = table((1:13)', DAT_deaths_week');
data = renamevars(data,["Var1", "Var2"],["Week", "Deaths"])
data = 13×2 table
 WeekDeaths
1114
2228
3350
4466
55156
66190
77156
88108
9968
101077
111133
121265
131324
bar(1:13, DAT_deaths_week)
title("Deaths")
xlabel('time (weeks)')
ylabel('population')
The shape of this bar graph roughly corresponds to that of the infectives in our SIR simulation, with the death toll peaking at week 6 with 190 deaths in total. If we assume that on average, about one death occurs per 500 infectives, then we can estimate the size of the infective population as follows:
DAT_infectives_week = DAT_deaths_week.*500;
scatter(1:13, DAT_infectives_week)
title("Est. Infectives")
xlabel('time (days)')
ylabel('population')
This gives us a rough estimate of the number of infectives in New York City during the 13 weeks of data collection. In the following sections, we attempt to model these data.

4.2.2 ODE Approximations

S0 = 7900000-14*500;
I0 = 14*500;
R0 = 0;
N = 7900000;
beta_abs = 0.5;
gamma_abs = 0.6;
t_max = 100;
 
initial = [S0, I0, R0]./N;
t_span = 0:0.5:t_max;
 
 
[t, y] = ode45(@(t,initial)simple_SIR(t, initial, beta_abs, gamma_abs), t_span, initial);
 
plot(t, y)
xlabel('time (days)')
ylabel('proportion')
legend('S', 'I', 'R')
title('SIR Model (β = 50%, γ = 60%)')
It seems like our model suggests that the pandemic has relatively little effect on NYC, with the proportions of all three groups (susceptible, infective, and recovered) remaining staying unchanged. In our simulation, less than 1 percent of the population was ever infected, with a total of around 4000 people ever contracting the disease.
PDE_infectives_day = y(:, 2) * N;
 
plot(t, PDE_infectives_day)
hold on
scatter(0:7:84,DAT_infectives_week)
hold off
xlabel('time (days)')
ylabel('population')
legend('Simulated Infectives', 'Est. Infectives')
title('SIR Model vs Est. Infectives (β = 50%, γ = 60%)')
Our simulation differs drastically with the estimated infectives we calculated from the death toll. While the estimation suggests growth in the population with a peak around day 40 with nearly 100000 individuals infected before we observe a decrease, our model claims that the infective population will never surpass its initial value of 7000 and is strictly decreasing as a function of time. This is entirely unrealistic since after week 6, there are more deaths caused by the Hong Kong flu than our simulated infectives.
Let's look at our horrendous fitting error:
PDE_infectives_week = PDE_infectives_day(1:14:169)';
fitting_err = abs(PDE_infectives_week - DAT_infectives_week)
fitting_err = 1×13
104 ×
0 1.054978807250708 2.330891043284907 3.217381519177569 7.759674101931363 9.480268156953722 7.790383577928999 5.395309402830692 3.397705862128233 3.848880603814871 1.649454670874461 3.249733444281872 1.199869741765539
total_error = sum(fitting_err)
total_error =
5.037453093222293e+05
scatter(1:13, fitting_err)
title('Residual Plots of the Curve Fit')
That is a lot of error, and the residual plot has an obvious pattern (it resembles our Estimated Infectives plot to an uncanny degree). In general, a curve fit is never good if the residual plot does not look randomly distributed. We attempt to improve that below by manually guessing a β that will hopefully give us a better curve fit.
beta = 0.7;
 
[t, y] = ode45(@(t,initial)simple_SIR(t, initial, beta, gamma_abs), t_span, initial);
 
PDE_infectives_day = y(:, 2) * N;
 
plot(t, PDE_infectives_day)
hold on
scatter(0:7:84,DAT_infectives_week)
hold off
xlabel('time (days)')
ylabel('population')
legend('Guess SIR Model', 'Est. Infectives')
title('SIR Model vs Est. Infectives (β = 70%, γ = 60%)')
Now our simulated result looks much closer to the estimated infectives, growing until its peak at around 40 days before declining.
PDE_infectives_week = PDE_infectives_day(1:14:169)';
fitting_err_new = abs(PDE_infectives_week - DAT_infectives_week);
format long
We attempt to minimize the sum of the error:
disp(sum(fitting_err_new))
1.098647834352148e+05
This error is better than our initial guess, but it's still a lot.
Let's look at our residual plot just for fun:
scatter(1:13, (PDE_infectives_week - DAT_infectives_week))
title('Residual Plot of the Curve Fit')
xlabel('time (days)')
ylabel('deviation from Est. Infectives')
Much more random than before, but we can do better.

5. Above and Beyond: Extending and Refining the Previous Methods

5.1 Using fminsearch

Instead of guessing or estimating the infection and recovery rates, we can instead use matlab's built-in fminsearch function to improve accuracy (and save us time):
x0 = [0.69, 0.6];
arr = @(x) (ode45(@(t,initial)simple_SIR(t, initial, x(1), x(2)), t_span, initial).y(2,1:13)* 7900000 - DAT_infectives_week);
fit_abs = @(x) sum(abs(arr(x)));
fit_rsquare = @(x) sum((arr(x)).^2);
result_abs = fminsearch(fit_abs, x0);
result_rsquare = fminsearch(fit_rsquare, x0);
beta_abs = result_abs(1);
gamma_abs = result_abs(2);
beta_rsquare = result_rsquare(1);
gamma_rsquare = result_rsquare(2);
 
[t, y_abs] = ode45(@(t,initial)simple_SIR(t, initial, beta_abs, gamma_abs), t_span, initial);
[~, y_rsquare] = ode45(@(t,initial)simple_SIR(t, initial, beta_rsquare, gamma_rsquare), t_span, initial);
 
PDE_infectives_day_abs = y_abs(:, 2) .* 7900000;
PDE_infectives_day_rsquare = y_rsquare(:, 2) .* 7900000;
 
plot(t, PDE_infectives_day_abs)
hold on
plot(t, PDE_infectives_day_rsquare)
scatter(0:7:84,DAT_infectives_week)
hold off
xlabel('time (days)')
ylabel('population')
legend('Model with min(Σ|residual|)','Model with min(Σ(residual)^{2})', 'Est. Infectives')
title('SIR Models Compared to Est. Infectives')
In the graph above, the blue curve minimizes the sum of the absolute value of the residuals, while the red curve minimizes the sum of the residual squared. The latter is called the least squares regression method which is widely applied in data science, but its sensitivity to outliers are sometimes undesired. Regardless, both curves look similar to the resulting curve of our manual guess and check (which means we had quite a good guess!), growing until its peak at around 40 days before declining just like the estimated infective count.
PDE_infectives_week_abs = PDE_infectives_day_abs(1:14:169)';
PDE_infectives_week_rsquare = PDE_infectives_day_rsquare(1:14:169)';
 
format long
Let's see how much our errors are:
simple_error_abs = sum(abs(PDE_infectives_week_abs - DAT_infectives_week))
simple_error_abs =
1.075762063543975e+05
simple_error_rsquare = sum(abs(PDE_infectives_week_rsquare - DAT_infectives_week))
simple_error_rsquare =
1.073959595583730e+05
A lot less than our manual guess and check! The R square model seems a bit better than the absolute value model.
Let's compare the residual plot, just to check our curve fits: our attempts are successful they seem randomly distributed.
scatter(1:13, (PDE_infectives_week_abs - DAT_infectives_week))
title('Residual Plots of the Curve Fit using Σ|residual|')
xlabel('time (days)')
ylabel('deviation from Est. Infectives')
scatter(1:13, (PDE_infectives_week_rsquare - DAT_infectives_week))
title('Residual Plots of the Curve Fit using Σ(residual)^{2}')
xlabel('time (days)')
ylabel('deviation from Est. Infectives')
Both seems randomly distributed. Success!

5.2 Using SIR with Demography

We now incorporate the birth and death rates into our model with birth rate = mortality rate = μ. We assume that all new-borns are susceptible, and no one can be born infective or recovered. The updated model is as follows:
x0 = [0.7, 0.3, 0.001];
arr = @(x) (ode45(@(t,initial)demography_SIR(t, initial, x(1), x(2), x(3)), t_span, initial).y(2,1:13)* 7900000 - DAT_infectives_week);
Due to personal taste, we shall use the Mean Absolute Method to calculate our error.
fit_abs = @(x) sum(abs(arr(x)));
result_abs = fminsearch(fit_abs, x0);
As a sanity check, we should look at what fminsearch suggests our values for β, γ, and μ should be (what if fminsearch gives a negative death rate??) and see if those values make sense.
beta_abs = result_abs(1)
beta_abs =
0.732428983280519
gamma_abs = result_abs(2)
gamma_abs =
0.635405328823344
mu_abs = result_abs(3)
mu_abs =
6.683361703546827e-04
Good, these are reasonable values. We can proceed with our curve fit:
[t, y_abs] = ode45(@(t,initial)demography_SIR(t, initial, beta_abs, gamma_abs, mu_abs), t_span, initial);
 
PDE_infectives_day_abs = y_abs(:,2).*N;
 
plot(t, PDE_infectives_day_abs)
hold on
scatter(0:7:84,DAT_infectives_week)
hold off
xlabel('time (days)')
ylabel('population')
legend('Model with min(Σ(residual)^{2})', 'Est. Infectives')
title('SIR Models Compared to Est. Infectives')
At this point we should not expect any drastic changes in our curve fit as our model has become pretty accurate, and our simulation indeed looks smilar to before. Notably, we can see that during day 80 and 100, the SIR with demography model predicts a slower decline in infectives than the simple SIR model. This is most likely caused by the positive birth rate leading to a constant increasing effect in the susceptible population and therefore leading to more infectives than before.
PDE_infectives_week_abs = PDE_infectives_day_abs(1:14:169)';
format long
scatter(1:13, (PDE_infectives_week_abs - DAT_infectives_week))
Let's see our error:
demography_error_abs = sum(abs(PDE_infectives_week_abs - DAT_infectives_week))
demography_error_abs =
1.064586268999627e+05
improvement = simple_error_abs - demography_error_abs
improvement =
1.117579454434861e+03
We've further reduced the error by more than 1000, and our residual plot looks random:
title('Residual Plots of the Curve Fit using Σ|residual|')
xlabel('time (days)')
ylabel('deviation from Est. Infectives')

6. Summary and Conclusion

The SIR Model is an incredibly simple yet efficient approximation for a disease that remains contained in a community. By modelling both hypothetical and real pandemics, I gained insight about how neither infection rate nor recovery rate alone can determine the trajectory of a pandemic: it is the ratio of those that provide the best prediction for the spread.
Additionally, I really appreciated the method of incorporating randomness for β and γ in the simulation rather than introducing new variables for mask usage, proximity, the health condition of individuals, etc. However, since some variables are essential to improving the accuracy of our simulation (e.g. birth-rate and death-rate), we must consider the trade-off between accuracy and computational complexity.
While MatLab's fminsearch gave us a tool for finding the coefficients to minimize our error, it is only capable of finding a local minimum and is therefore dependent on the initial guess values. As we incorporate more variables, there will be more local minima and we must consider a different approach to find the true optimal values for our coefficients.
Our SIR Models has the following limitations:
  1. It fails to consider the birth rate and death rate of the population. Even in our SIR with demography model, we assumed that the birth rate and the natural death rate of the population are the same, and that is not the case for countries with a non-zero population growth: positive in the case of India and negative in the case of Japan. While its effect may not be significant in short-term simulations, it might create an impactful difference for epidemics spanning more than a year.
  2. It fails to consider the case where a recovered individual become susceptible again, which is the case for common cold and influenza. We assumed that all individuals can only contract the disease at most once, rendering our model ineffective and inaccurate in modelling diseases capable of repeated infections.
  3. It fails to consider vaccination which will make some susceptible individuals become immune to the disease (and in our case, join the recovered group). Due to advancements in medicine, a lot of pandemics caused by infectious diseases will have vaccination shots that reduce the chance of infection for a lot of susceptible individuals and our model fails to take this into account.
  4. In our SIR model with demography, we assumed that all new-borns are susceptible to the disease, but maternal antibodies will protect them from some disease for some time (most notably measles). Therefore depending on the traits of the disease we wish to study, we need to adjust our model to incorporate the consideration for maternally-derived immunity.

7. Reference and Acknowledgement

7.1 References

ACER. “What Is Mathematical Modelling?” Australian Council for Educational Research - ACER, 2019, www.immchallenge.org.au/supporting-resources/what-is-mathematical-modelling.
“Introducing the Mathematical Modelling of Infectious Disease Dynamics Collection.” EveryONE, 20 Feb. 2020, everyone.plos.org/2020/02/20/mathematical-disease-dynamics/. Accessed 9 Dec. 2023.
Linka, Kevin, et al. “The Reproduction Number of COVID-19 and Its Correlation with Public Health Interventions.” Computational Mechanics, vol. 66, no. 4, 28 July 2020, pp. 1035–1050, https://doi.org/10.1007/s00466-020-01880-8.
MathWorks. “ReferenceEntities.” Mathworks.com, 2019, www.mathworks.com/help/matlab/.
Moreno-Esteva, Enrique Garcia. “An SEIR like Model That Fits the Coronavirus Infection Data - Online Technical Discussion Groups - Wolfram Community.” Community.wolfram.com, 2019, community.wolfram.com/groups/-/m/t/1888335.
Yang, Vicky. “Math Modeling Flowchart.” Www.vcyang.com, Dec. 2016, www.vcyang.com/model_flowchart/. Accessed 9 Dec. 2023.

7.2 Acknowledgement

I would like to thank Dr. Changhan He for his amazing instruction and guidance on this project.
I would also like to acknowledge my amazing classmates for their collaboration. ‌

Appendix

function out = simple_SIR(~, initial, beta, gamma)
dS = -beta * initial(2) * initial(1);
dI = beta * initial(2) * initial(1) - gamma * initial(2);
dR = gamma * initial(2);
 
out = [dS; dI; dR];
end
 
 
function out = simple_SIR_2(~, initial, beta_range, gamma_range)
beta = beta_range(1) + (beta_range(2)-beta_range(1))*rand;
gamma = gamma_range(1) + (gamma_range(2)-gamma_range(1))*rand;
 
dS = -beta * initial(2) * initial(1);
dI = beta * initial(2) * initial(1) - gamma * initial(2);
dR = gamma * initial(2);
 
out = [dS; dI; dR];
end
 
 
function out = demography_SIR(~, initial, beta, gamma, mu)
dS = -beta * initial(2) * initial(1) - mu * initial(1) + mu;
dI = beta * initial(2) * initial(1) - gamma * initial(2) - mu * initial(2);
dR = gamma * initial(2) - mu * initial(3);
 
out = [dS; dI; dR];
end